In [196]:

    
%%capture
%run "5 - Statistics.ipynb"
%run "8 - Gradient Descent.ipynb"
import matplotlib.pyplot as plt
import random
%matplotlib inline

The Model

Let us define a simple prediction function that takes two constants, $\alpha$ and $\beta$:



In [197]:

    
def predict(alpha, beta, x_i):
    return beta * x_i + alpha

What should we use for alpha and beta? Suppose we know the desired output y_i, we can calculate the error for our inputs:



In [198]:

    
def error(alpha, beta, x_i, y_i):
    return y_i - predict(alpha, beta, x_i)

Now we can calculate the errors across the entire data set:



In [199]:

    
def sum_of_squared_errors(alpha, beta, x, y):
    return sum(error(alpha, beta, x_i, y_i)**2 for x_i, y_i in zip(x, y))

Now we just need to find the inputs that minimize the sum of squared errors:



In [200]:

    
def least_squares_fit(x, y):
    beta = correlation(x, y) * standard_deviation(y) / standard_deviation(x)
    alpha = mean(y) - beta * mean(x)
    return alpha, beta



In [201]:

    
alpha, beta = least_squares_fit(num_friends_clean, daily_minutes_clean)
alpha, beta









    Out[201]:





(22.94755241346903, 0.903865945605865)



In [202]:

    
predict(alpha, beta, 20)









    Out[202]:





41.02487132558633



In [203]:

    
plt.title('Simple Linear Regression Model');
plt.ylabel('minutes per day');
plt.xlabel('# of friends')
plt.scatter(num_friends_clean, daily_minutes_clean);
plt.plot(range(0, 50), [predict(alpha, beta, x) for x in range(0, 50)], color='green');

Our model is pretty good for how simple it is! We can measure how well a model does using the coefficient of determination (aka. R-squared). This measures the fraction of the total amount of variation in the dependent variable that is predicted by the model.



In [204]:

    
def total_sum_of_squares(y):
    """the total squared variation of y_i's from their mean"""
    return sum(v ** 2 for v in de_mean(y))

def r_squared(alpha, beta, x, y):
    """the fraction of variation in y captured by the model, which equals
    1 - the fraction of variation in y not captured by the model"""
    return 1.0 - (sum_of_squared_errors(alpha, beta, x, y) / total_sum_of_squares(y))

r_squared(alpha, beta, num_friends_clean, daily_minutes_clean) # 0.329









    Out[204]:





0.3291078377836305

Higher R-squared scores represent a better fitting model. 1 is the highest that an R-squared score can go.

Using Gradient Descent

We can also make predictions using the Gradient Descent algorithm:



In [205]:

    
def squared_error(x_i, y_i, theta):
    alpha, beta = theta
    return error(alpha, beta, x_i, y_i)**2

def squared_error_gradient(x_i, y_i, theta):
    alpha, beta = theta
    return [-2 * error(alpha, beta, x_i, y_i), # alpha partial derivative
            -2 * error(alpha, beta, x_i, y_i) * x_i] # beta partial derivative

# choose random value to start
random.seed(0)
theta = [random.random(), random.random()]
alpha, beta = minimize_stochastic(squared_error, squared_error_gradient, num_friends_clean, daily_minutes_clean, theta, 0.0001)
alpha, beta









    Out[205]:





(22.93746417548679, 0.9043371597664965)



In [ ]: